Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request metrics improvements #8420

Merged
merged 18 commits into from
Jan 16, 2025
Merged

Request metrics improvements #8420

merged 18 commits into from
Jan 16, 2025

Conversation

enahum
Copy link
Contributor

@enahum enahum commented Dec 17, 2024

Summary

More improvements on network metrics include speed, effective latency and more importantly some changes with the entry fetching endpoints.

Ticket Link

https://mattermost.atlassian.net/browse/MM-61878

Release Note

NONE

Update groupLabel to use a type definition
Include effective Latency and Average Speed in the metrics
Include parallel and sequential request counts
Copy link
Member

@isacikgoz isacikgoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added my comments, wherever I have an idea :)

}

const urlData = Object.entries(groupData.urls);
const sumLatency = urlData.reduce((result, url) => (result + (url[1].metrics?.latency ?? 0)), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we discard 0 latency from equation, ideally it should never happen though right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should never happen really, the 0 here is forced to be type safe.

const effectiveParallelLatency = Math.max(...parallelLatencies, 0);
const effectiveSequentialLatency = sequentialLatencies.reduce((sum, latency) => sum + latency, 0);

const effectiveLatency = effectiveParallelLatency + effectiveSequentialLatency;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be summed or should we get the avg especially for sequential latency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I'm aware, for parallel requests the latency is the maximum latency of all requests and then sequential requests do add up to get the effective latency as if you execute two sequential requests and one takes 3s and the other one 5s then you waited a total of 8s.

that is the reason why I added the avg latency overall.

const averageSpeedMbps = averageSpeedBps / 1_000_000; // Convert to Mbps

return {
averageSpeedMbps,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can use avg speed to decide whether download files or whatsoever.

performance.mark('tti', options);
}

public measureTimeToInteraction() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, long anticipated metric is here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but is nothing more than an output to the console at the moment.. I need to figure out where and how to include it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I create another metric on the server to collect this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rahimrahman can it be optional? I think that'll be very nice to have but haven't checked into how to include it as is a totally different metric collected in a different way and somewhere else in the code

)

* feat(MM-61865): send network info telemetry data

* unit testing

* fix latency vs effectiveLatency

* cleanup test

* fix failing tests

* fix spelling error

* fix failing test

* more cleanup
Comment on lines 115 to 179
categorizeRequests(groupLabel: RequestGroupLabel): CategorizedRequests {
const parallel: ClientResponseMetrics[] = [];
const sequential: ClientResponseMetrics[] = [];
const groupData = this.requestGroups.get(groupLabel);
if (!groupData) {
return {parallel, sequential};
}

// Sort requests by start time
const requestsMetrics = Object.entries(groupData.urls).map((e) => e[1].metrics).filter((m) => m != null);
requestsMetrics.sort((a, b) => a.startTime - b.startTime);

let lastEndTime = 0;

for (const metrics of requestsMetrics) {
if (metrics.startTime < lastEndTime) {
// Overlapping request -> Parallel
parallel.push(metrics);
} else {
// Non-overlapping request -> Sequential
sequential.push(metrics);
}

// Update the last end time
lastEndTime = Math.max(lastEndTime, metrics.endTime);
}

return {parallel, sequential};
}

calculateAverageSpeedWithCategories = (
categorizedRequests: CategorizedRequests,
elapsedTimeInSeconds: number, // Observed total elapsed time in seconds
): { averageSpeedMbps: number; effectiveLatency: number } => {
const {parallel, sequential} = categorizedRequests;

// Step 1: Calculate total data size in bits
const totalDataBits = [...parallel, ...sequential].reduce((sum, req) => sum + (req.compressedSize * 8), 0);

// Step 2: Calculate effective latency
const parallelLatencies = parallel.map((req) => req.latency);
const sequentialLatencies = sequential.map((req) => req.latency);

const effectiveParallelLatency = Math.max(...parallelLatencies, 0);
const effectiveSequentialLatency = sequentialLatencies.reduce((sum, latency) => sum + latency, 0);

const effectiveLatency = effectiveParallelLatency + effectiveSequentialLatency;

// Step 3: Calculate data transfer time
const dataTransferTime = elapsedTimeInSeconds - (effectiveLatency / 1000);

// Handle edge case: if data transfer time is zero or negative
if (dataTransferTime <= 0) {
return {averageSpeedMbps: 0, effectiveLatency: 0};
}

// Step 4: Calculate average speed
const averageSpeedBps = totalDataBits / dataTransferTime; // Speed in bps
const averageSpeedMbps = averageSpeedBps / 1_000_000; // Convert to Mbps

return {
averageSpeedMbps,
effectiveLatency,
};
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a significant proposed improvement based on data observation. I will try to explain as much as possible in this PR: #8439.

* fix: improve parallel requests calculation

* fix test issue

* multiple parallelGroups testing

* calculateAverageSpeedWithCategories test

* categorizedRequests in it's own describe

* clean up duplicate tests

* check for when groupLabel is non-existent

* a case when groupLabel is non-existent

* more testing with effective latency.
Copy link
Contributor

@rahimrahman rahimrahman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent quite a bit of time updating the tracking.ts in order to send telemetry data to the server. At this point, I think it is safe to say that it's been "tested" based on the changes and work that I've done in my 2 PRs.

@rahimrahman rahimrahman requested a review from larkox January 7, 2025 22:55
Copy link
Contributor

@larkox larkox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. A few comments here and there, but nothing big.

@@ -237,32 +239,51 @@ export async function resetMessageCount(serverUrl: string, channelId: string) {
}
}

export async function storeMyChannelsForTeam(serverUrl: string, teamId: string, channels: Channel[], memberships: ChannelMembership[], prepareRecordsOnly = false, isCRTEnabled?: boolean) {
const resolveAndflattenModels = async (operator: ServerDataOperator, modelPromises: Array<Promise<Model[]>>, description: string, prepareRecordsOnly: boolean) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const resolveAndflattenModels = async (operator: ServerDataOperator, modelPromises: Array<Promise<Model[]>>, description: string, prepareRecordsOnly: boolean) => {
const resolveAndFlattenModels = async (operator: ServerDataOperator, modelPromises: Array<Promise<Model[]>>, description: string, prepareRecordsOnly: boolean) => {

if (prepareRecordsOnly) {
return {models: flattenedModels};
}
return resolveAndflattenModels(operator, modelPromises, 'storeAllMyChannels', prepareRecordsOnly);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always struggle a bit with how JavaScript handles this.

At this point, we are returning the promise, and therefore, exiting the try/catch block. So... if a exception occur inside the promise, the exception will not be handled here.

So... in this particular case, the catch will only catch the first promise resolution of prepareAllMyChannels, but not the resolution of resolveAndFlattenModels or the resolution of prepareChannels inside prepareAllMyChannels.

I think the linter understands you are inside a try/catch and allows you to do this:

Suggested change
return resolveAndflattenModels(operator, modelPromises, 'storeAllMyChannels', prepareRecordsOnly);
return await resolveAndflattenModels(operator, modelPromises, 'storeAllMyChannels', prepareRecordsOnly);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but would it get caught by the caller? In this case /remote/channel.ts has a try-catch block there too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confirmed that the await is not needed due to try..catch on the caller catching this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But should the caller be catching it? Shouldn't be this function the one catching it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point being is, there's try..catch on the callers. So errors within the promise will get bubbled up and caught by the callers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By looking at the function, there is nothing that shows that it is going to throw. On the contrary, having the whole body of the function inside a try catch block should be an indication that you don't expect this function to throw.

If the callers of this function have try catch, great, it won't fail. But that doesn't mean this is not wrong. Not only that, we will lose part of the context of what is failing, since the catch will be higher up, instead of here, where it will clearly log that the the problem is specifically in storeAllMyChannels.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a new commit to add a try..catch in resolveAndFlattenModels. So if any of the models fail for whatever reason, we'll get a more isolated error output.


return {models: flattenedModels};
return resolveAndflattenModels(operator, modelPromises, 'storeMyChannelsForTeam', prepareRecordsOnly);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about returning promises

Suggested change
return resolveAndflattenModels(operator, modelPromises, 'storeMyChannelsForTeam', prepareRecordsOnly);
return await resolveAndflattenModels(operator, modelPromises, 'storeMyChannelsForTeam', prepareRecordsOnly);

Comment on lines +430 to +432
for (const teamId of teamIdsSet) {
categoriesPromises.push(client.getCategories('me', teamId, groupLabel));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have an absurd amount of teams, we are flooding the network layer with get categories calls. It is not common, but we have already had reports of people having 100+ teams.

Unless we create a priority queue down at the network layer, we may want to consider batching this, so other processes can access the network layer before this finishes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Thanks for your observation. This is something this PR/changes is trying to establish on where the bottlenecks are and learn from it. And prioritize all the issues. We have a long way to go, but first, data.

Comment on lines +435 to +437
for (let i = 1; i <= remaining; i++) {
remainingMembershipsPromises.push(client.getAllMyChannelMembersFromAllTeams(i, General.CHANNEL_MEMBERS_CHUNK_SIZE, groupLabel));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This for loop adds to the problem in my previous comment.

Comment on lines -85 to -90
const tabletDevice = isTablet();
const isActiveServer = (await getActiveServerUrl()) === serverUrl;
if (isActiveServer && tabletDevice && initialChannelId === currentChannelId) {
await markChannelAsRead(serverUrl, initialChannelId, false, groupLabel);
markChannelAsViewed(serverUrl, initialChannelId);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably fine, but I wonder why we are removing this.

@rahimrahman rahimrahman added Build Apps for PR Build the mobile app for iOS and Android to test 3: QA Review Requires review by a QA tester labels Jan 9, 2025
@rahimrahman rahimrahman modified the milestones: v2.25.0, v2.24.0 Jan 9, 2025
@rahimrahman
Copy link
Contributor

@yasserfaraazkhan I know you have so much on your plate already. This is another PR that I would love to get through for 2.25.0.

In config.json, please enable telemetry CollectNetworkMetrics to true.

You should be able to see debug output once you've run the app with that setting enabled. Or use CHarles Proxy to see mobile telemetry sending data to the server, whether it accepts it or not.

If you can spin up a server from my changes here: mattermost/mattermost#29601, you should be able to kill 2 birds with 1 stone. The request will show up in log, and if you connect prometheus server to it, you should be able to see the output.

@rahimrahman
Copy link
Contributor

/update-branch

@rahimrahman rahimrahman added Build Apps for PR Build the mobile app for iOS and Android to test and removed Build Apps for PR Build the mobile app for iOS and Android to test labels Jan 12, 2025
@amyblais amyblais added CherryPick/Approved Meant for the quality or patch release tracked in the milestone and removed CherryPick/Approved Meant for the quality or patch release tracked in the milestone labels Jan 13, 2025
Copy link
Contributor

@yasserfaraazkhan yasserfaraazkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified using /cloud instance and enabling diagnostics.
checked channel switching, opening threads flow. Working as expected

@amyblais amyblais added 4: Reviews Complete All reviewers have approved the pull request QA Review Done CherryPick/Approved Meant for the quality or patch release tracked in the milestone and removed 3: QA Review Requires review by a QA tester labels Jan 16, 2025
@rahimrahman rahimrahman force-pushed the request-metrics-improvements branch from 5f37436 to 57a432c Compare January 16, 2025 17:17
@rahimrahman rahimrahman added Build Apps for PR Build the mobile app for iOS and Android to test and removed Build Apps for PR Build the mobile app for iOS and Android to test labels Jan 16, 2025
@rahimrahman rahimrahman merged commit d1d3fda into main Jan 16, 2025
6 checks passed
@rahimrahman rahimrahman deleted the request-metrics-improvements branch January 16, 2025 20:55
@mattermost-build
Copy link
Contributor

Cherry pick is scheduled.

@mattermost-build
Copy link
Contributor

Error trying doing the automated Cherry picking. Please do this manually

+++ Updating remotes...
Fetching upstream
hostfile_replace_entries: mkstemp: Read-only file system
update_known_hosts: hostfile_replace_entries failed for /app/.ssh/known_hosts: Read-only file system
From github.com:mattermost/mattermost-mobile
 * [new branch]          fixTimezoneCrash -> upstream/fixTimezoneCrash
   2f3dfbbbf..d1d3fda83  main             -> upstream/main
   790219b9f..8f0328e14  mobile-draft-e2e -> upstream/mobile-draft-e2e
   22ed82051..1b5ec1e00  release-2.25     -> upstream/release-2.25
Fetching upstream
hostfile_replace_entries: mkstemp: Read-only file system
update_known_hosts: hostfile_replace_entries failed for /app/.ssh/known_hosts: Read-only file system
+++ Updating remotes done...
+++ Creating local branch automated-cherry-pick-of-mattermost-mobile-#8420-upstream-release-2.25-1737060933
Switched to a new branch 'automated-cherry-pick-of-mattermost-mobile-#8420-upstream-release-2.25-1737060933'
Branch 'automated-cherry-pick-of-mattermost-mobile-#8420-upstream-release-2.25-1737060933' set up to track remote branch 'release-2.25' from 'upstream'.

+++ About to attempt cherry pick of PR #8420 with merge commit d1d3fda83a6e5f561f4790ae3f375142d02f92c9.

CONFLICT (modify/delete): app/actions/websocket/posts.test.ts deleted in HEAD and modified in d1d3fda83 (Request metrics improvements (#8420)).  Version d1d3fda83 (Request metrics improvements (#8420)) of app/actions/websocket/posts.test.ts left in tree.
error: could not apply d1d3fda83... Request metrics improvements (#8420)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
!!! git cherry-pick failed

+++ Aborting in-progress git cherry-pick.

+++ Returning you to the master branch and cleaning up.

rahimrahman added a commit that referenced this pull request Jan 17, 2025
* Network metrics

* update network-client ref

* fix unit tests

* missing catch error parsing

* Replace API's to fetch all teams and channels for a user
Update groupLabel to use a type definition
Include effective Latency and Average Speed in the metrics
Include parallel and sequential request counts

* update network library and fix tests

* feat(MM-61865): send network request telemetry data to the server (#8436)

* feat(MM-61865): send network info telemetry data

* unit testing

* fix latency vs effectiveLatency

* cleanup test

* fix failing tests

* fix spelling error

* fix failing test

* more cleanup

* fix: improve parallel requests calculation (#8439)

* fix: improve parallel requests calculation

* fix test issue

* multiple parallelGroups testing

* calculateAverageSpeedWithCategories test

* categorizedRequests in it's own describe

* clean up duplicate tests

* check for when groupLabel is non-existent

* a case when groupLabel is non-existent

* more testing with effective latency.

* resolveAndFlattenModels with a capital F

* add try..catch within resolveAndFlattenModels

* update groupLabel to fix failing lint

* fix linting issue due to unknown group label

* forgot to push changes

* fix the indentation problem again

* add env var option for COLLECT_NETWORK_METRICS

---------

Co-authored-by: Rahim Rahman <[email protected]>
Co-authored-by: Mattermost Build <[email protected]>
@amyblais amyblais added CherryPick/Done Successfully cherry-picked to the quality or patch release tracked in the milestone and removed CherryPick/Approved Meant for the quality or patch release tracked in the milestone labels Jan 17, 2025
yasserfaraazkhan pushed a commit that referenced this pull request Jan 17, 2025
* Network metrics

* update network-client ref

* fix unit tests

* missing catch error parsing

* Replace API's to fetch all teams and channels for a user
Update groupLabel to use a type definition
Include effective Latency and Average Speed in the metrics
Include parallel and sequential request counts

* update network library and fix tests

* feat(MM-61865): send network request telemetry data to the server (#8436)

* feat(MM-61865): send network info telemetry data

* unit testing

* fix latency vs effectiveLatency

* cleanup test

* fix failing tests

* fix spelling error

* fix failing test

* more cleanup

* fix: improve parallel requests calculation (#8439)

* fix: improve parallel requests calculation

* fix test issue

* multiple parallelGroups testing

* calculateAverageSpeedWithCategories test

* categorizedRequests in it's own describe

* clean up duplicate tests

* check for when groupLabel is non-existent

* a case when groupLabel is non-existent

* more testing with effective latency.

* resolveAndFlattenModels with a capital F

* add try..catch within resolveAndFlattenModels

* update groupLabel to fix failing lint

* fix linting issue due to unknown group label

* forgot to push changes

* fix the indentation problem again

* add env var option for COLLECT_NETWORK_METRICS

---------

Co-authored-by: Rahim Rahman <[email protected]>
Co-authored-by: Mattermost Build <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4: Reviews Complete All reviewers have approved the pull request Build Apps for PR Build the mobile app for iOS and Android to test CherryPick/Done Successfully cherry-picked to the quality or patch release tracked in the milestone QA Review Done release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants